libtcb: skip privilege drop in unprivileged user namespaces#38
libtcb: skip privilege drop in unprivileged user namespaces#38Blarse wants to merge 2 commits intoopenwall:mainfrom
Conversation
|
Thank you @Blarse! Are these changes already in use in ALT's package perhaps? I was concerned at first that the Another approach I'd consider is only checking for |
"The unknown" includes possible future changes to the kernel, where the |
Detect /proc/self/setgroups == "deny" to recognize an unprivileged user namespace where setgroups(2) is permanently denied by the kernel. No-op on non-Linux.
When /proc/self/setgroups is "deny" we cannot call setgroups(2), and euid 0 inside such a namespace does not carry real privileges anyway. Treat this case the same as running non-root: mark the privs structure with PRIV_MAGIC_NONROOT and return success. Fixes failures of pam_tcb, libnss_tcb, tcb_unconvert and shadow's shadowtcb_drop_priv() when running under rootless container.
|
Hi,
Not yet, but I'm working on porting mkosi to ALT Linux and stumbled upon a systemd-sysusers segfault caused by this. I can confirm that this patch fixes it. Apparently, when useradd invoked libtcb and libtcb could not drop privileges after a failed setgroups() call, it left an empty shadow file, which in turn triggered another issue in ALT's downstream patch to systemd-sysusers that adds tcb support.
(For reference, virtiofsd hit the same setgroups + user namespace issue and applied a similar fix in !207.)
|
9f42bc3 to
ad67dc3
Compare
When an unprivileged user sets up a user namespace without newuidmap, it must first write "deny" to /proc/self/setgroups before writing gid_map, after which setgroups() becomes permanently forbidden[1]. tcb_drop_priv_r() then trips over sys_setgroups(0, NULL) and bails out with EPERM, even though euid 0 inside the namespace is not real root and there is nothing to drop in the first place.
In my case this breaks mkosi tool that builds images inside an unprivileged userns, where any tcb-aware utility aborts on the setgroups step. The patch detects the "deny" state via a small setgroups_allowed() helper and short-circuits tcb_drop_priv_r() with PRIV_MAGIC_NONROOT, the same path already taken for non-root callers. No behavioural change outside user namespaces; non-Linux builds get a stub that always returns 1.
[1] https://man7.org/linux/man-pages/man7/user_namespaces.7.html